4,801 research outputs found
Considerations about multistep community detection
The problem and implications of community detection in networks have raised a
huge attention, for its important applications in both natural and social
sciences. A number of algorithms has been developed to solve this problem,
addressing either speed optimization or the quality of the partitions
calculated. In this paper we propose a multi-step procedure bridging the
fastest, but less accurate algorithms (coarse clustering), with the slowest,
most effective ones (refinement). By adopting heuristic ranking of the nodes,
and classifying a fraction of them as `critical', a refinement step can be
restricted to this subset of the network, thus saving computational time.
Preliminary numerical results are discussed, showing improvement of the final
partition.Comment: 12 page
An automatic method to generate domain-specific investigator networks using PubMed abstracts
<p>Abstract</p> <p>Background</p> <p>Collaboration among investigators has become critical to scientific research. This includes ad hoc collaboration established through personal contacts as well as formal consortia established by funding agencies. Continued growth in online resources for scientific research and communication has promoted the development of highly networked research communities. Extending these networks globally requires identifying additional investigators in a given domain, profiling their research interests, and collecting current contact information. We present a novel strategy for building investigator networks dynamically and producing detailed investigator profiles using data available in PubMed abstracts.</p> <p>Results</p> <p>We developed a novel strategy to obtain detailed investigator information by automatically parsing the affiliation string in PubMed records. We illustrated the results by using a published literature database in human genome epidemiology (HuGE Pub Lit) as a test case. Our parsing strategy extracted country information from 92.1% of the affiliation strings in a random sample of PubMed records and in 97.0% of HuGE records, with accuracies of 94.0% and 91.0%, respectively. Institution information was parsed from 91.3% of the general PubMed records (accuracy 86.8%) and from 94.2% of HuGE PubMed records (accuracy 87.0). We demonstrated the application of our approach to dynamic creation of investigator networks by creating a prototype information system containing a large database of PubMed abstracts relevant to human genome epidemiology (HuGE Pub Lit), indexed using PubMed medical subject headings converted to Unified Medical Language System concepts. Our method was able to identify 70â90% of the investigators/collaborators in three different human genetics fields; it also successfully identified 9 of 10 genetics investigators within the PREBIC network, an existing preterm birth research network.</p> <p>Conclusion</p> <p>We successfully created a web-based prototype capable of creating domain-specific investigator networks based on an application that accurately generates detailed investigator profiles from PubMed abstracts combined with robust standard vocabularies. This approach could be used for other biomedical fields to efficiently establish domain-specific investigator networks.</p
Outlier Edge Detection Using Random Graph Generation Models and Applications
Outliers are samples that are generated by different mechanisms from other
normal data samples. Graphs, in particular social network graphs, may contain
nodes and edges that are made by scammers, malicious programs or mistakenly by
normal users. Detecting outlier nodes and edges is important for data mining
and graph analytics. However, previous research in the field has merely focused
on detecting outlier nodes. In this article, we study the properties of edges
and propose outlier edge detection algorithms using two random graph generation
models. We found that the edge-ego-network, which can be defined as the induced
graph that contains two end nodes of an edge, their neighboring nodes and the
edges that link these nodes, contains critical information to detect outlier
edges. We evaluated the proposed algorithms by injecting outlier edges into
some real-world graph data. Experiment results show that the proposed
algorithms can effectively detect outlier edges. In particular, the algorithm
based on the Preferential Attachment Random Graph Generation model consistently
gives good performance regardless of the test graph data. Further more, the
proposed algorithms are not limited in the area of outlier edge detection. We
demonstrate three different applications that benefit from the proposed
algorithms: 1) a preprocessing tool that improves the performance of graph
clustering algorithms; 2) an outlier node detection algorithm; and 3) a novel
noisy data clustering algorithm. These applications show the great potential of
the proposed outlier edge detection techniques.Comment: 14 pages, 5 figures, journal pape
Network 'small-world-ness': a quantitative method for determining canonical network equivalence
Background: Many technological, biological, social, and information networks fall into the broad class of 'small-world' networks: they have tightly interconnected clusters of nodes, and a shortest mean path length that is similar to a matched random graph (same number of nodes and edges). This semi-quantitative definition leads to a categorical distinction ('small/not-small') rather than a quantitative, continuous grading of networks, and can lead to uncertainty about a network's small-world status. Moreover, systems described by small-world networks are often studied using an equivalent canonical network model-the Watts-Strogatz (WS) model. However, the process of establishing an equivalent WS model is imprecise and there is a pressing need to discover ways in which this equivalence may be quantified.
Methodology/Principal Findings: We defined a precise measure of 'small-world-ness' S based on the trade off between high local clustering and short path length. A network is now deemed a 'small-world' if S. 1-an assertion which may be tested statistically. We then examined the behavior of S on a large data-set of real-world systems. We found that all these systems were linked by a linear relationship between their S values and the network size n. Moreover, we show a method for assigning a unique Watts-Strogatz (WS) model to any real-world network, and show analytically that the WS models associated with our sample of networks also show linearity between S and n. Linearity between S and n is not, however, inevitable, and neither is S maximal for an arbitrary network of given size. Linearity may, however, be explained by a common limiting growth process.
Conclusions/Significance: We have shown how the notion of a small-world network may be quantified. Several key properties of the metric are described and the use of WS canonical models is placed on a more secure footing
Seeds Buffering for Information Spreading Processes
Seeding strategies for influence maximization in social networks have been
studied for more than a decade. They have mainly relied on the activation of
all resources (seeds) simultaneously in the beginning; yet, it has been shown
that sequential seeding strategies are commonly better. This research focuses
on studying sequential seeding with buffering, which is an extension to basic
sequential seeding concept. The proposed method avoids choosing nodes that will
be activated through the natural diffusion process, which is leading to better
use of the budget for activating seed nodes in the social influence process.
This approach was compared with sequential seeding without buffering and single
stage seeding. The results on both real and artificial social networks confirm
that the buffer-based consecutive seeding is a good trade-off between the final
coverage and the time to reach it. It performs significantly better than its
rivals for a fixed budget. The gain is obtained by dynamic rankings and the
ability to detect network areas with nodes that are not yet activated and have
high potential of activating their neighbours.Comment: Jankowski, J., Br\'odka, P., Michalski, R., & Kazienko, P. (2017,
September). Seeds Buffering for Information Spreading Processes. In
International Conference on Social Informatics (pp. 628-641). Springe
The Routing of Complex Contagion in Kleinberg's Small-World Networks
In Kleinberg's small-world network model, strong ties are modeled as
deterministic edges in the underlying base grid and weak ties are modeled as
random edges connecting remote nodes. The probability of connecting a node
with node through a weak tie is proportional to , where
is the grid distance between and and is the
parameter of the model. Complex contagion refers to the propagation mechanism
in a network where each node is activated only after neighbors of the
node are activated.
In this paper, we propose the concept of routing of complex contagion (or
complex routing), where we can activate one node at one time step with the goal
of activating the targeted node in the end. We consider decentralized routing
scheme where only the weak ties from the activated nodes are revealed. We study
the routing time of complex contagion and compare the result with simple
routing and complex diffusion (the diffusion of complex contagion, where all
nodes that could be activated are activated immediately in the same step with
the goal of activating all nodes in the end).
We show that for decentralized complex routing, the routing time is lower
bounded by a polynomial in (the number of nodes in the network) for all
range of both in expectation and with high probability (in particular,
for and
for in expectation),
while the routing time of simple contagion has polylogarithmic upper bound when
. Our results indicate that complex routing is harder than complex
diffusion and the routing time of complex contagion differs exponentially
compared to simple contagion at sweetspot.Comment: Conference version will appear in COCOON 201
Semi-Supervised Overlapping Community Finding based on Label Propagation with Pairwise Constraints
Algorithms for detecting communities in complex networks are generally
unsupervised, relying solely on the structure of the network. However, these
methods can often fail to uncover meaningful groupings that reflect the
underlying communities in the data, particularly when those structures are
highly overlapping. One way to improve the usefulness of these algorithms is by
incorporating additional background information, which can be used as a source
of constraints to direct the community detection process. In this work, we
explore the potential of semi-supervised strategies to improve algorithms for
finding overlapping communities in networks. Specifically, we propose a new
method, based on label propagation, for finding communities using a limited
number of pairwise constraints. Evaluations on synthetic and real-world
datasets demonstrate the potential of this approach for uncovering meaningful
community structures in cases where each node can potentially belong to more
than one community.Comment: Fix table
Structural Properties of Ego Networks
The structure of real-world social networks in large part determines the
evolution of social phenomena, including opinion formation, diffusion of
information and influence, and the spread of disease. Globally, network
structure is characterized by features such as degree distribution, degree
assortativity, and clustering coefficient. However, information about global
structure is usually not available to each vertex. Instead, each vertex's
knowledge is generally limited to the locally observable portion of the network
consisting of the subgraph over its immediate neighbors. Such subgraphs, known
as ego networks, have properties that can differ substantially from those of
the global network. In this paper, we study the structural properties of ego
networks and show how they relate to the global properties of networks from
which they are derived. Through empirical comparisons and mathematical
derivations, we show that structural features, similar to static attributes,
suffer from paradoxes. We quantify the differences between global information
about network structure and local estimates. This knowledge allows us to better
identify and correct the biases arising from incomplete local information.Comment: Accepted by SBP 2015, to appear in the proceeding
The Parameterized Complexity of Centrality Improvement in Networks
The centrality of a vertex v in a network intuitively captures how important
v is for communication in the network. The task of improving the centrality of
a vertex has many applications, as a higher centrality often implies a larger
impact on the network or less transportation or administration cost. In this
work we study the parameterized complexity of the NP-complete problems
Closeness Improvement and Betweenness Improvement in which we ask to improve a
given vertex' closeness or betweenness centrality by a given amount through
adding a given number of edges to the network. Herein, the closeness of a
vertex v sums the multiplicative inverses of distances of other vertices to v
and the betweenness sums for each pair of vertices the fraction of shortest
paths going through v. Unfortunately, for the natural parameter "number of
edges to add" we obtain hardness results, even in rather restricted cases. On
the positive side, we also give an island of tractability for the parameter
measuring the vertex deletion distance to cluster graphs
- âŠ